road sign
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests
Corbière, Charles, Roburin, Simon, Montariol, Syrielle, Bosselut, Antoine, Alahi, Alexandre
Large vision-language models (LVLMs) augment language models with visual understanding, enabling multimodal reasoning. However, due to the modality gap between textual and visual data, they often face significant challenges, such as over-reliance on text priors, hallucinations, and limited capacity for complex visual reasoning. Existing benchmarks to evaluate visual reasoning in LVLMs often rely on schematic or synthetic images and on imprecise machine-generated explanations. To bridge the modality gap, we present DrivingVQA, a new benchmark derived from driving theory tests to evaluate visual chain-of-thought reasoning in complex real-world scenarios. It offers 3,931 expert-crafted multiple-choice problems and interleaved explanations grounded with entities relevant to the reasoning process. We leverage this dataset to perform an extensive study of LVLMs' ability to reason about complex visual scenarios. Our experiments reveal that open-source and proprietary LVLMs struggle with visual chain-of-thought reasoning under zero-shot settings. We investigate training strategies that leverage relevant entities to improve visual reasoning. Notably, we observe a performance boost of up to 7\% when reasoning over image tokens of cropped regions tied to these entities.
- Europe > France (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Education (1.00)
Automated Road Safety: Enhancing Sign and Surface Damage Detection with AI
Merolla, Davide, Latorre, Vittorio, Salis, Antonio, Boanelli, Gianluca
Public transportation plays a crucial role in our lives, and the road network is a vital component in the implementation of smart cities. Recent advancements in AI have enabled the development of advanced monitoring systems capable of detecting anomalies in road surfaces and road signs, which, if unaddressed, can lead to serious road accidents. This paper presents an innovative approach to enhance road safety through the detection and classification of traffic signs and road surface damage using advanced deep learning techniques. This integrated approach supports proactive maintenance strategies, improving road safety and resource allocation for the Molise region and the city of Campobasso. The resulting system, developed as part of the Casa delle Tecnologie Emergenti (House of Emergent Technologies) Molise (Molise CTE) research project funded by the Italian Minister of Economic Growth (MIMIT), leverages cutting-edge technologies such as Cloud Computing and High Performance Computing with GPU utilization. It serves as a valuable tool for municipalities, enabling quick detection of anomalies and the prompt organization of maintenance operations
- Europe > Italy > Molise > Campobasso Province > Campobasso (0.24)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Italy > Sardinia > Cagliari (0.04)
- Overview > Innovation (0.68)
- Research Report > Promising Solution (0.66)
Adversary ML Resilience in Autonomous Driving Through Human Centered Perception Mechanisms
Physical adversarial attacks on road signs are continuously exploiting vulnerabilities in modern day autonomous vehicles (AVs) and impeding their ability to correctly classify what type of road sign they encounter. Current models cannot generalize input data well, resulting in overfitting or underfitting. In overfitting, the model memorizes the input data but cannot generalize to new scenarios. In underfitting, the model does not learn enough of the input data to accurately classify these road signs. This paper explores the resilience of autonomous driving systems against three main physical adversarial attacks (tape, graffiti, illumination), specifically targeting object classifiers. Several machine learning models were developed and evaluated on two distinct datasets: road signs (stop signs, speed limit signs, traffic lights, and pedestrian crosswalk signs) and geometric shapes (octagons, circles, squares, and triangles). The study compared algorithm performance under different conditions, including clean and adversarial training and testing on these datasets. To build robustness against attacks, defense techniques like adversarial training and transfer learning were implemented. Results demonstrated transfer learning models played a crucial role in performance by allowing knowledge gained from shape training to improve generalizability of road sign classification, despite the datasets being completely different. The paper suggests future research directions, including human-in-the-loop validation, security analysis, real-world testing, and explainable AI for transparency. This study aims to contribute to improving security and robustness of object classifiers in autonomous vehicles and mitigating adversarial example impacts on driving systems.
- North America > United States > Pennsylvania (0.04)
- Europe > Russia (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Russia (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology > Security & Privacy (1.00)
Protecting Autonomous Cars from Phantom Attacks
Early computer vision studies aimed at developing computerized driver intelligence appeared in the mid-1980s when scientists first demonstrated a road-following robot.36 Studies performed from the mid-1980s until 2000 established the fundamentals for automated driver intelligence in related tasks, including detection of pedestrians,39 lanes,3 and road signs.9 However, the vast majority of initial computer vision algorithms aimed at detecting objects required developers to manually program dedicated features. The increase in computational power available in recent years changed the way AI models are created: Features are automatically extracted by training various neural network architectures on raw data. Automatic feature extraction outperformed and replaced the traditional approach of manually programming an object's features.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (0.96)
- Information Technology > Security & Privacy (0.93)
Road signs "driving" you crazy?
You also might be wondering what all this CNN stuff is. Don't worry, I can explain A CNN is a type of neural network (read my other articles to learn the basics) that is particularly good with image classification. CNNs are used for computer vision because they are great at detecting patterns in images, such as lines, circles and other shapes and patterns. A CNN uses convolutional layers, which essentially learn filters that can detect patterns in an image. For example, a filter could detect vertical lines, or it could detect horizontal lines. These filters "convolve" over an image, going in little 3x3 (or whatever the size of the filter) chunks to get the dot product of said 3x3 chunk.
AI Don't Know Jack? – MetaDevo
Think your AI understands the meanings of words? Or understands anything at all? Guess again. There's a big issue inherent in trying to make artificial minds that understand like a human does. It's called the Symbol Grounding Problem1S. TLDR: How can understanding in an AI be made intrinsic to the system, rather than just parasitic on the meanings in the minds of the developers / trainers?
Could microscale concave interfaces help self-driving cars read road signs? – Physics World
A structural colour technology that produces concentric rainbows could help autonomous vehicles read road signs, scientists in the US and China claim. As well as exploring the physics of these novel reflective surfaces, the researchers show that they can produce two different image signals at the same time. Autopilot systems that read both signals would be less likely to misinterpret altered road signs, they suggest. Car autopilot systems use infrared laser-based light detection and ranging (lidar) systems to scan their environment and recognize traffic situations. To read signs, autonomous vehicles rely on visible cameras and pattern recognition algorithms.
- North America > United States (0.25)
- Asia > China (0.25)
- Transportation > Ground > Road (0.54)
- Transportation > Passenger (0.41)
- Information Technology > Robotics & Automation (0.41)
Tesla AI
Elon Musk's vision is to change Tesla from an electric car company to a robotics company and the Tesla AI Day shows his commitment towards his vision. Tesla is one of the biggest automobile company and due to their research and implementation of AI in their cars (Self Driving System) they are as much a software company as an automobile company. Tesla as a company is deeply involved with AI in hardware that is used beyond the self-driving systems. Elon Musk wants Tesla to be seen as "much more than an electric car company." On Thursday's Tesla AI Day, the CEO described Tesla as a company with "deep AI activity in hardware on the inference level and on the training level" that can be used down the line for applications beyond self-driving cars, including a humanoid robot that Tesla is apparently building.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
Optical Adversarial Attack Can Change the Meaning of Road Signs
Researchers in the US have developed an adversarial attack against the ability of machine learning systems to correctly interpret what they see – including mission-critical items such as road signs – by shining patterned light onto real world objects. In one experiment, the approach succeeded in causing the meaning of a'STOP' roadside sign to be transformed into a '30mph' speed limit sign. Perturbations on a sign, created by shining crafted light on it, distorts how it is interpreted in a machine learning system. The research is entitled Optical Adversarial Attack, and comes from Purdue University in Indiana. An OPtical ADversarial attack (OPAD), as proposed by the paper, uses structured illumination to alter the appearance of target objects, and requires only a commodity projector, a camera and a computer. The researchers were able to successfully undertake both white-box and black box attacks using this technique.
- Transportation (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
When I train a model for days...
I study a PhD in Security within Machine Learning and this is actually an extremely dangerous thing with nearly all DNN models due to how they 'see' data and is used within many ML attacks. DNN's don't see the world as we do (Obviously) but more importantly that means images or data can appear exactly the same to us, but to a DNN be completely different.You can imagine a scenario where a DNN within a autonomous car can be easily tricked to misclassify road signs. To us, a readable STOP sign with always say STOP, even if it has scratches, and dirt on the sign, we can easily interpret what the sign should be telling us. However an attacker can use noise (Similar to the photo of another road sign) to alter the image in tiny ways to cause a DNN to think a STOP sign is actually just a speed limit sign, while to us it still looks exactly like a STOP sign. Deploy such an attack on a self driving car at a junction with a stop sign and you can imagine how the car would simply drive on rather than stopping. You'll be surprised how easy it is to trick AI, even big companies like YouTube's have issues with this within copyright music detection if you perform complex ML attacks upon the music.Here's a paper similar to the scenario I described but by placing stickers in specific places to make an AI not see stop signs; https://arxiv.org/pdf/1707.08945.pdf - _Waldy_
- Transportation > Ground > Road (0.98)
- Transportation > Passenger (0.61)
- Information Technology > Robotics & Automation (0.61)